A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning
نویسندگان
چکیده
Multi-Agent Reinforcement Learning (MARL) is widely used to solve various real-world problems. In MARL, the environment contains multiple agents. A good grasp of can guide agents learn cooperative strategies. Centralized Training Decentralized Execution (CTDE), a centralized critic strategies learning. However, having in leads curse dimensionality and influence other agents’ strategies, resulting difficulties for critics We propose graph-based approach overcome above It uses graph neural network, which partial observations as input, information between aggregated by methods extract about whole environment. this way, improve their understanding overall state while avoiding dimensional explosion. Then we combine dual dynamic decomposition method with soft actor-critic train policy. The former individual global rewards learning, latter help an optional policy better. call Graph-based Actor-Critic (MAGAC). compare our proposed several classical MARL algorithms under Multi-agent Particle Environment (MPE). experimental results show that achieve faster learning speed better
منابع مشابه
ACCNet: Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning
Communication is a critical factor for the big multi-agent world to stay organized and productive. Typically, most multi-agent “learning-to-communicate” studies try to predefine the communication protocols or use technologies such as tabular reinforcement learning and evolutionary algorithm, which can not generalize to changing environment or large collection of agents. In this paper, we propos...
متن کاملSoft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods t...
متن کاملDynamic Control with Actor-Critic Reinforcement Learning
4 Actor-Critic Marble Control 4 4.1 R-code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 The critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3 Unstable actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.4 Trading off stability against...
متن کامل1 Supervised Actor - Critic Reinforcement Learning
Editor’s Summary: Chapter ?? introduced policy gradients as a way to improve on stochastic search of the policy space when learning. This chapter presents supervised actor-critic reinforcement learning as another method for improving the effectiveness of learning. With this approach, a supervisor adds structure to a learning problem and supervised learning makes that structure part of an actor-...
متن کاملActor-Critic Reinforcement Learning with Energy-Based Policies
We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly restricted Boltzmann machines), and train them using policy gradient learning. Our approach builds upon Sallans and Hinton (2004), who parameterized value functions using energy-based models, trained using a non-linear var...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computers Communications & Control
سال: 2023
ISSN: ['1841-9844', '1841-9836']
DOI: https://doi.org/10.15837/ijccc.2023.1.5062